Skip to content

[Draft] Add compression ratio calculation and per-column compression stats (#18184)#18185

Draft
johnsolomonj wants to merge 29 commits into
apache:masterfrom
johnsolomonj:feature/compression-stats-tracking
Draft

[Draft] Add compression ratio calculation and per-column compression stats (#18184)#18185
johnsolomonj wants to merge 29 commits into
apache:masterfrom
johnsolomonj:feature/compression-stats-tracking

Conversation

@johnsolomonj
Copy link
Copy Markdown
Contributor

@johnsolomonj johnsolomonj commented Apr 13, 2026

Labels: feature, release-notes, observability

Summary

Draft implementation for the PEP proposed in #18184. Kept as draft pending design review on the issue.

Adds compression ratio tracking and per-column compression stats to Pinot's existing table size and metadata APIs:

  • Track uncompressed forward index sizes at write time in all raw column writers (BaseChunkForwardIndexWriter subclasses, VarByteChunkForwardIndexWriterV4/V5/V6, CLPForwardIndexCreatorV2)
  • Persist uncompressed size and compression codec to metadata.properties per column
  • Expose compressionStats, columnCompressionStats, and storageBreakdown on both GET /tables/{table}/size and GET /tables/{table}/metadata
  • Add TABLE_COMPRESSION_RATIO_PERCENT and TABLE_TIERED_STORAGE_SIZE controller gauges with tier lifecycle management
  • Gated by table-level indexingConfig.compressionStatsEnabled flag (default: off, zero overhead when disabled)

Design document

See #18184 for the full PEP including motivation, prior art, API response structure, and known corner cases.

Key design decisions

  • Per-value tracking: Uncompressed size tracked at individual put*() callsites, capturing raw ingested data size without chunk headers or alignment padding
  • Shared codec resolution: ForwardIndexType.resolveCompressionType() handles CLP codec variants, used by both BaseSegmentCreator and ForwardIndexHandler
  • Dict columns out of scope: Dictionary-encoded columns report -1 for uncompressed size since writers only see dictionary IDs
  • Backward compatible: New metadata fields are additive; old segments gracefully return defaults

Test plan

  • Unit tests for writer uncompressed size tracking (fixed-byte, var-byte V1-V3, V4/V5/V6)
  • Unit tests for CLP V2 sub-stream size aggregation
  • Unit tests for ForwardIndexType.resolveCompressionType() codec resolution
  • Unit tests for ForwardIndexHandler compression stats persistence on reload
  • Controller aggregation tests (dict sentinel preservation, negative ratio guards, partial coverage)
  • Integration test for end-to-end compression stats API response
  • Verify zero overhead when compressionStatsEnabled = false

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 57.87234% with 99 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.82%. Comparing base (e76da0e) to head (0741c48).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...local/segment/creator/impl/BaseSegmentCreator.java 4.16% 22 Missing and 1 partial ⚠️
...ent/creator/impl/fwd/CLPForwardIndexCreatorV2.java 0.00% 19 Missing ⚠️
...ocal/segment/index/loader/ForwardIndexHandler.java 29.16% 15 Missing and 2 partials ⚠️
.../local/segment/index/forward/ForwardIndexType.java 0.00% 8 Missing ⚠️
...ment/index/forward/ForwardIndexCreatorFactory.java 68.18% 4 Missing and 3 partials ⚠️
...ocal/segment/creator/impl/ColumnIndexCreators.java 0.00% 5 Missing ⚠️
...inot/segment/spi/creator/IndexCreationContext.java 0.00% 5 Missing ⚠️
.../writer/impl/FixedByteChunkForwardIndexWriter.java 50.00% 2 Missing and 2 partials ⚠️
...ot/segment/spi/creator/SegmentGeneratorConfig.java 25.00% 3 Missing ⚠️
...gment/local/io/writer/impl/VarByteChunkWriter.java 0.00% 2 Missing ⚠️
... and 4 more

❗ There is a different number of reports uploaded between BASE (e76da0e) and HEAD (0741c48). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (e76da0e) HEAD (0741c48)
unittests2 1 0
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18185      +/-   ##
============================================
- Coverage     63.73%   55.82%   -7.91%     
+ Complexity     1932      823    -1109     
============================================
  Files          3292     2568     -724     
  Lines        201503   148791   -52712     
  Branches      31320    24043    -7277     
============================================
- Hits         128429    83065   -45364     
+ Misses        62794    58624    -4170     
+ Partials      10280     7102    -3178     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 55.82% <57.87%> (-7.91%) ⬇️
temurin 55.82% <57.87%> (-7.91%) ⬇️
unittests 55.82% <57.87%> (-7.91%) ⬇️
unittests1 55.82% <57.87%> (+0.04%) ⬆️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@johnsolomonj johnsolomonj force-pushed the feature/compression-stats-tracking branch 3 times, most recently from b9e573e to 7667a13 Compare May 12, 2026 23:13
@johnsolomonj johnsolomonj force-pushed the feature/compression-stats-tracking branch from 86fbf8d to 88a9887 Compare May 19, 2026 14:51
…to table size API

This feature enables tracking and reporting of forward index compression
effectiveness across Pinot segments. When `compressionStatsEnabled` is set
in table config's indexing config, segment creation records uncompressed
forward index sizes and compression codec in metadata.properties. The
server-side table size endpoint now returns per-segment and per-column
raw/compressed forward index sizes. The controller aggregates these into
table-level compression ratio metrics (raw/compressed), with partial
coverage tracking for mixed-version clusters. Three new ControllerGauge
metrics (TABLE_COMPRESSION_RATIO_PERCENT, TABLE_RAW_FORWARD_INDEX_SIZE_PER_REPLICA,
TABLE_COMPRESSED_FORWARD_INDEX_SIZE_PER_REPLICA) are emitted for monitoring.
ForwardIndexHandler is updated to persist compression metadata during
segment reload operations (compression type change and dict-to-raw conversion).
…feature

- Add 6 new test files covering writer-level tracking, segment creation,
  corner cases, ForwardIndexHandler reload, and integration tests for both
  offline and realtime (Kafka) ingestion paths
- Merge redundant dual-loop in TableSizeReader into a single pass over
  server info, improving performance during table size aggregation
- Fix offline integration test teardown to properly wait for table data
  manager removal before stopping servers
- Wrap second table cleanup in offline test in finally block to prevent
  resource leaks on assertion failure
…tier breakdown, and stale metadata cleanup

- Wrap flat compression fields in nested CompressionStats DTO with @JsonInclude(NON_NULL)
- Add StorageBreakdown with per-tier segment count and size (always reported)
- Add per-column ColumnCompressionDetail with aggregated sizes, ratio, and codec (MIXED when codecs differ across segments)
- Gate compressionStats on tableConfig.indexingConfig.compressionStatsEnabled; suppress from JSON when OFF
- Fix isPartialCoverage: now correctly returns true when 0 segments have stats but non-missing segments exist
- Clear stale forwardIndex.compressionCodec and forwardIndex.uncompressedSizeBytes on raw-to-dict reload
- Support null values in SegmentMetadataUtils.updateMetadataProperties to clear properties
- Add TABLE_TIERED_STORAGE_SIZE gauge; emit tier metrics always; clear compression+tier gauges when flag OFF
- Add testRawToDictClearsCompressionStats, testCompressionStatsNullWhenFlagOff, per-column/tier assertions
- Update integration tests for nested compressionStats JSON structure
…API, and comprehensive tests

- Gate uncompressed size tracking in forward index writers via compressionStatsEnabled flag
  (ForwardIndexCreatorFactory, ForwardIndexHandler, all raw index creators)
- Add per-column compression stats aggregation to server TablesResource and
  ServerSegmentMetadataReader with MIXED codec detection
- Extend TableMetadataInfo DTO with columnCompressionStats field (NON_NULL suppression)
- Fix integration test schema name mismatch for disabled-stats table
- Add 7 new test classes: IndexingConfigCompressionFlagTest, SegmentGeneratorConfigPropagationTest,
  CLPForwardIndexCreatorV2StatsTest, ServerTableSizeReaderRawBytesTest,
  TableMetadataReaderCompressionTest, TableMetadataInfoCompressionTest,
  ForwardIndexHandlerCompressionStatsTest updates
…feature flag

- Replace Map<String, ColumnCompressionStatsInfo> with List<ColumnCompressionStatsInfo> array
  containing all required fields: column, uncompressedSizeInBytes, compressedSizeInBytes,
  compressionRatio, codec, hasDictionary, indexes
- Gate columnCompressionStats in both server endpoints (TablesResource metadata,
  TableSizeResource size) on compressionStatsEnabled feature flag
- Add controller-side suppression in PinotTableRestletResource for safety against
  old servers that may still emit stats when flag is off
- Fix forward index size accumulation: use getIndexSizeFor(StandardIndexes.forward())
  directly per segment instead of cumulative variable
- Sort columnCompressionStats array by column name for deterministic output
- Update all tests and DTOs for the new array schema
…, and metadata endpoint gaps

Resolve default compression codec (LZ4/PASS_THROUGH) in BaseSegmentCreator and
ForwardIndexHandler when table config leaves chunkCompressionType null. Include
dictionary-encoded columns in columnCompressionStats with hasDictionary=true.
Clear stale controller compression metrics when no segments report stats. Suppress
zeroed compressionStats for dict-only tables. Add compressionStats summary to the
metadata endpoint aggregated from per-column data. Add tests for all fixes.
…ession

- Move columnCompressionStats to top-level field on TableSubTypeSizeDetails
  instead of nesting inside CompressionStats inner class
- Remove unused ColumnCompressionDetail inner class; use shared
  ColumnCompressionStatsInfo DTO from pinot-common
- Fix metadata endpoint field names to match size endpoint:
  rawForwardIndexSizePerReplicaInBytes, compressedForwardIndexSizePerReplicaInBytes
- Suppress compressionStats and columnCompressionStats for dict-only tables
  and when feature flag is OFF
- Dictionary columns report -1 for uncompressedSizeInBytes to distinguish
  from zero-size raw columns on both size and metadata endpoints
- Use LinkedHashSet for index deduplication in per-column aggregation
…adata endpoint

- Create CompressionStatsSummary DTO in pinot-common with
  rawForwardIndexSizePerReplicaInBytes, compressedForwardIndexSizePerReplicaInBytes,
  and compressionRatio
- Create StorageBreakdownInfo DTO in pinot-common with per-tier count and size
- Add both as @JsonInclude(NON_NULL) fields on TableMetadataInfo with
  backward-compatible constructors
- Server computes compressionStats summary and storageBreakdown during segment
  iteration and includes them in TableMetadataInfo response
- Controller aggregates both fields across servers in ServerSegmentMetadataReader
  (divides by numReplica like other fields)
- Remove manual addCompressionStatsSummary() JSON manipulation from
  PinotTableRestletResource; controller now just strips fields when flag is OFF
- Fix use-after-release: tier accumulation moved inside try block before
  segments are released in the finally block
…racking accuracy

- Remove ObjectNode.remove() pattern from PinotTableRestletResource; pass
  compressionStatsEnabled flag through TableMetadataReader and
  ServerSegmentMetadataReader so DTOs are constructed with null at creation
  time when the flag is OFF (storageBreakdown always preserved)
- Add segmentsWithStats, totalSegments, isPartialCoverage to
  CompressionStatsSummary so metadata endpoint has identical JSON schema
  to the size endpoint; populate during server and controller aggregation
- Fix VarByteChunkForwardIndexWriterV4 uncompressed size tracking: track
  raw value byte lengths in putBytes() instead of buffer.remaining() in
  write() which included chunk-format header overhead
- Move segment stats counting inside the try block in TablesResource to
  avoid accessing segment metadata after segments are released
- Add test for compression stats suppression when flag is disabled
Old segments lacking uncompressed size metadata (pre-flag segments) were
contributing their compressed forward index size to the denominator while
adding nothing to the numerator, deflating the compression ratio. Now
only dictionary-encoded columns enter the compressed-only accumulation
path; raw columns on old segments without codec/uncompressed data are
skipped entirely from both numerator and denominator.
- Track uncompressed size per value in putInt/putLong/putFloat/putDouble
  (FixedByteChunkForwardIndexWriter) and putBytes (VarByteChunkForwardIndexWriter)
  instead of using chunk buffer remaining bytes which overcounts due to
  chunk-internal offset tables
- Gate server size endpoint compression field collection on feature flag
  so no metadata access or accumulation occurs when flag is OFF
- Exclude old raw segments lacking a persisted compression codec from
  per-column stats to prevent sentinel values leaking into aggregation
- Guard Math.max in per-column accumulation against INDEX_NOT_FOUND (-1)
  sentinel values
- Preserve per-column compression stats for dict-only tables even when
  no segments have raw forward index data (segmentsWithStats == 0)
- Rename TABLE_COMPRESSION_RATIO_HUNDREDTHS gauge to
  TABLE_COMPRESSION_RATIO_PERCENT for consistency
- Hoist IndexService.getInstance() outside per-column inner loop
- Fix negative compression ratio for dict columns in metadata endpoint:
  require both uncompressed > 0 and compressed > 0 before computing
  ratio in ServerSegmentMetadataReader (per-column and summary level)
- Track emitted tier gauge keys per table in TableSizeReader so stale
  tier-suffixed gauges (tableName.tierKey) are removed when tiers
  disappear or table is deleted via SegmentStatusChecker cleanup
- Resolve CLP V2 actual compression type (ZSTANDARD) from the
  CompressionCodec before falling back to ForwardIndexConfig's
  chunkCompressionType, which maps all CLP variants to PASS_THROUGH
Extract resolveCompressionType() as a shared utility in ForwardIndexType
that correctly maps CLP codec variants to their actual compression types
(CLPV2/CLPV2_ZSTD → ZSTANDARD, CLPV2_LZ4 → LZ4, CLP → PASS_THROUGH).
This fixes ForwardIndexHandler using incorrect compression types during
codec changes and dict-to-raw conversions. BaseSegmentCreator now uses the
same shared method, handling nullable fieldType for schema evolution cases.

Also document CLPForwardIndexCreatorV2.getUncompressedSize() semantics:
returns pre-compression sub-stream byte total, not original UTF-8 length.
…umns

Size endpoint flag-OFF path now uses the 5-arg SegmentSizeInfo constructor
to pass through tier information, fixing storageBreakdown flattening all
segments into the "default" tier when compression stats are disabled.

Metadata endpoint aggregation now skips columns from old raw segments that
have no persisted compression codec and no dictionary, preventing zero-filled
entries from appearing in per-column compression stats.
Dictionary columns report -1 as uncompressed forward index size. When
aggregating across replicas, skip accumulation for negative sentinels
(using >= 0 guard) and reconstruct -1 in the output for dict columns
that have no real uncompressed data.

Also guard compressionStatsSummary construction on segmentsWithStats > 0
after replica division, avoiding a degenerate summary when integer
division rounds the count to zero.
…stats

Tier gauge emission was passing the tier-suffixed key (e.g.
"myTable_OFFLINE.coldTier") to isLeaderForTable(), which expects the
base table name. Hoist the leader check to use the canonical
tableNameWithType before the tier loop, then emit gauges directly.

On the server metadata endpoint, move computeIfAbsent for per-column
compression accumulators inside the conditional branches so old raw
segments without a persisted codec no longer create zero-filled entries.
- Reorder FieldConfig import in BaseSegmentCreator to satisfy spotless
- Replace ColumnMetadata.INDEX_NOT_FOUND with UNAVAILABLE (upstream rename)
- Replace deprecated RandomStringUtils.randomAlphanumeric with secure().nextAlphanumeric
…Summary

- ColumnMetadataImplTest: add two tests verifying the new compression
  stats fields (FORWARD_INDEX_UNCOMPRESSED_SIZE, FORWARD_INDEX_COMPRESSION_CODEC)
  round-trip through fromPropertiesConfiguration, and that old segments
  without those keys return UNAVAILABLE/-1 and null respectively.
- CompressionStatsSummaryTest: new test class covering all getters,
  full/partial coverage flag, JSON round-trip, and unknown-field
  backward compatibility for the new DTO.
- ForwardIndexTypeTest: add testResolveCompressionType covering all 6 branches
  (CLP→PASS_THROUGH, CLPV2/CLPV2_ZSTD→ZSTANDARD, CLPV2_LZ4→LZ4, regular codec,
  field-type fallback, null fieldType)
- SegmentMetadataUtilsTest: new test verifying null-value map entries clear the
  property vs non-null entries overwriting it
- ColumnCompressionStatsInfoTest: new test covering all getters, JSON round-trip,
  null codec/indexes, hasDictionary, and unknown-field tolerance
- StorageBreakdownInfoTest: new test covering TierInfo getters, multi-tier
  round-trip, empty map, and unknown-field tolerance on both types
- TablesResourceTest: verify columnCompressionStats and compressionStats are null
  when compressionStatsEnabled is false
- TableSizeResourceTest: verify columnCompressionStats is null per-segment when
  compressionStatsEnabled is false
…rageBreakdown aggregation and dict sentinel path
…mixed-age tables

When a table has segments built before and after `noDictionaryColumns` was added to
the config, `TablesResource` was overwriting `columnHasDictMap` on each segment, so
the last segment iterated determined the reported `hasDictionary` value. If a new
raw segment happened to be processed before an old dict segment, the column would
be incorrectly flagged as `hasDictionary=true`, causing the table-level
`compressionStats` summary to be omitted.

Fix: use `merge` with `&&` (false wins) so that once any segment in the table marks
a column as raw, the per-table aggregated value is `false`. In the raw branch,
use the literal `false` directly since `codec != null && uncompressedSize > 0`
already proves the column has no dictionary.

Also increase ZkStarter initial wait (50ms → 2000ms + 500ms retry) to avoid
ZK probe failures on slow CI machines.
@johnsolomonj johnsolomonj force-pushed the feature/compression-stats-tracking branch from 88a9887 to ad9d6da Compare May 19, 2026 15:04
The 2000ms initial sleep and 50ms→500ms retry increase were added
during local quickstart debugging and should not be in this PR.
@johnsolomonj johnsolomonj force-pushed the feature/compression-stats-tracking branch from ad9d6da to 9c560a0 Compare May 19, 2026 15:11
…ionType NPE risk

ColumnCompressionStatsInfo.isHasDictionary() was serialized by Jackson as
\"isHasDictionary\" while @JsonCreator bound deserialization to \"hasDictionary\",
breaking round-trip consistency. Rename getter to hasDictionary() and add
@JsonProperty(\"hasDictionary\") so serialization and deserialization use the same
wire key. Update all call sites.

ForwardIndexHandler called compressionType.name() on the return of
resolveCompressionType() without a null check, violating the method's documented
contract. Wrap both call sites in null guards, matching BaseSegmentCreator's pattern.
…t SPI

The method was abstract, which would break any external IndexCreationContext
implementor on upgrade. Provide a default returning false (feature-off) to
preserve backward compatibility for plugins and external implementations.

Also document the mutable CompressionStats/StorageBreakdown inner classes as
intentional accumulators separate from the immutable pinot-common DTOs.
…oveTableGauge(name, key, gauge)

Tier storage gauges were emitted as setValueOfTableGauge(tableName + "." + tierKey, gauge, value),
concatenating the tier key into the tableName slot. This produces a non-canonical gauge name that
is invisible to any code querying gauges by table name. Use the correct 3-argument overloads
setOrUpdateTableGauge(tableName, tierKey, gauge, value) and removeTableGauge(tableName, tierKey, gauge)
throughout emitTierMetrics and clearTierMetrics.

The _trackUncompressedSize default-flip (C4.13) is deferred: writers are used directly in tests and
tooling without going through ForwardIndexCreatorFactory, so changing the default breaks those callers.
The existing factory-based path already sets the flag correctly via setTrackUncompressedSize(context.isCompressionStatsEnabled()).
@johnsolomonj johnsolomonj force-pushed the feature/compression-stats-tracking branch from 0bf95a3 to 0741c48 Compare May 19, 2026 19:41
@xiangfu0 xiangfu0 added release-notes Referenced by PRs that need attention when compiling the next release notes observability Related to observability (logging, tracing, metrics) feature New functionality labels May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New functionality observability Related to observability (logging, tracing, metrics) release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants